Integrating Multiple Internet Directories by Instance-based Learning

نویسندگان

  • Ryutaro Ichise
  • Hiedeaki Takeda
  • Shinichi Honiden
چکیده

Finding desired information on the Internet is becoming increasingly difficult. Internet directories such as Yahoo!, which organize web pages into hierarchical categories, provide one solution to this problem; however, such directories are of limited use because some bias is applied both in the collection and categorization of pages. We propose a method for integrating multiple Internet directories by instance-based learning. Our method provides the mapping of categories in order to transfer documents from one directory to another, instead of simply merging two directories into one. We present herein an effective algorithm for determining similar categories between two directories via a statistical method called the k-statistic. In order to evaluate the proposed method, we conducted experiments using two actual Internet directories, Yahoo! and Google. The results show that the proposed method achieves extensive improvements relative to both the Naive Bayes and Enhanced Naive Bayes approaches, without any text analysis on documents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بازیابی تعاملی تصاویر طبیعت با بهره گیری از یادگیری چند نمونه ای

Content-based image retrieval (CBIR) has received considerable research interest in the recent years. The basic problem in CBIR is the semantic gap between the high-level image semantics and the low-level image features. Region-based image retrieval and learning from user interaction through relevance feedback are two main approaches to solving this problem. Recently, the research in integra...

متن کامل

An Examination of the Relationships between Internet Directories

Finding desired information on the internet is becoming increasingly difficult. Internet directories such as Yahoo! which organize web pages into hierarchical categories provides one solution to this problem, however, such directories are of limited use because some bias is applied both in the collection and categorization of the pages. Therefore, we propose a method for integrating multiple in...

متن کامل

Identifying Predictive Structures in Relational Data Using Multiple Instance Learning

This paper introduces an approach for identifying predictive structures in relational data using the multiple-instance framework. By a predictive structure, we mean a structure that can explain a given labeling of the data and can predict labels of unseen data. Multiple-instance learning has previously only been applied to flat, or propositional, data and we present a modification to the framew...

متن کامل

Automated Alignment of Multiple Internet Directories

Directory services are tools for making useful information more accessible, but individual internet directories in directory services are of limited use in nding user-relevant web pages. In this paper, we propose a method for aligning URL information from one internet directory to another. This method can discover an appropriate position in a directory for a web page which is not shown in that ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003